Building a fit-for-purpose modern data stack, part 1.

Author:
Moss Pauly
Published:
February 13, 2023
Building a fit-for-purpose modern data stack, part 1.

In this multi-part blog post Sydney-based Zipster Moss Pauly, Senior Manager, Data Products introduces our ANZ data platform and the team’s journey over the last 18 months.
The broader Zip Data & Analytics and Engineering teams have led the transition from a fairly standard lakehouse implementation (S3, custom scripts and AWS Athena) to a modern data stack (dbt, Fivetran, Airbyte, Snowflake, Census, Snowplow, Airflow). And while the platform machinery has changed fairly extensively, our consumption layer has remained relatively constant throughout this with data scientists primarily using Databricks and our analytics team producing BI reporting in Tableau.
In this first part of the article - aimed at supporting others through the complex process of implementing a modern tech stack - Moss shares his insights on how to establish a framework to guide decision making on the various components that now make up Zip’s stack.
In part two, we take a deep dive into the six key decisions the team tackled, and the valuable experiences and key learnings Moss and the team gathered along the way. We hope you enjoy your read!
With the new year well underway, I’ve been reflecting on this journey from old to new. We were lucky enough to have a great team to navigate the many decisions needed, as well as access to a fantastic data community here in Sydney, Australia.
Even with a supportive community and an internet worth of reading material, I would have loved an in-depth article that went into the decision-making process of building a modern data stack and selecting components.
With that in mind, here’s the article I wish I’d had access to at the start of our journey.
Start with people’s experiences
Chances are that no matter what problem you’re looking at, you’re not the first. One of the most valuable things you can do is learn from those that have tackled this problem before you and reach out to your peers.
What do you like about your stack and what would you do differently next time?
I can’t remember how many times I asked this question throughout this journey. It must have been pushing upwards of 30 across a lot of different companies and people. When you’re early on in the process, you cannot ask this question enough. Generally speaking, people will be far more diplomatic in writing than in conversation, and we found that in order to really get a deep understanding of their experiences with different tools, nothing beat having a chat over a beer or two. We found pretty quickly in these chats that there were common callouts that really helped guide our decision-making.
Another great resource for understanding people’s experience with different components of the stack is talking to vendors who integrate against them — kick-off conversations with a number of providers you’re considering at the start.
Everyone in the modern data stack space I’ve talked to is really friendly and passionate about data and the challenges around data. If you’re talking to someone about egress, ask them what data warehouse most of their customers are using and what trends they’re seeing. If you’re talking to someone about a data warehouse, ask them what they’re seeing for transformations or egress. These vendors are privy to a bird's eye view of the landscape, and that’s really valuable to tap into.
Decision framework and process
We didn’t want to make decisions on components in this stack lightly. Rigour is really important here. At this point, I’ll cover how we evaluated decisions and some additional considerations.
The biggest benefit of the modern data stack is tight integration with each component solving their domain excellently. Decisions around components become easier as you lock in more components as you know exactly what they’re integrating with. At the start, you don’t have this so you need a clear vision of what the problem spaces are that you’re solving and what players you might consider.
As a starting point, we considered the following:
- Event Collection
- Data Ingress
- Data Warehousing
- Data Transformation
- Data Egress
Cost scalability is a key consideration for us. We’ve been burnt before with event volumes so we went into cost scalability with eyes wide open. We evaluated this in the following way:
- SaaS that can migrate to open source is a massive plus. This means that we can reduce time to value initially and always have an option to control costs if we need.
- Any paid component is evaluated at 1x, 2x and 4x expected volumes. This gives us an idea of the economy of scale.
- All decisions are made after we’ve done a POC, got our hands dirty and actually played with it. Some things are great on paper but the workflows can be sub-optimal.
- Anything we want to deploy and manage ourselves has to run on a container platform and slot in with our operational tooling.
- Look at the supporting community for each tool. The larger and more accessible it is, the better.
- Listen to when people talk about how delighted they are with something and follow your gut.
Lastly, document your decisions thoroughly. You’ve probably spent weeks researching, comparing and testing options here, so this is in your best interests.
- Capture the options you considered with the pros and cons for each, and a clear articulation of your recommendation. It helps to clarify, compare, and take decision-makers on the journey.
- You (or someone else) may need to return to a decision in the future, and it helps to restore a detailed understanding of the context available at the time.
- It’s likely going to result in you asking for an investment from your business, so having an in-depth articulation is much more likely to get you money than a strong verbal suggestion.
- If you have a process in your organisation for socialising and endorsing strategic decisions, use it. If you don’t, set up something lightweight that involves key stakeholders (and budget approvers).
In the next part of the article I wanted to share the six key decisions we landed on, detailing our requirements, the path we chose and some tips that the reader may find useful based on our experiences as a team.
Read part 2 of ‘Building a fit-for-purpose modern data stack’ now >
Like what you've read?
Check out these other articles

September 21, 2023
Celebrating Wear it Purple Day
Author: Zach Rennick

August 23, 2023
Zipsters support the Future of Finance
Author: Lucy Lindsay and Jimmy Kelly

June 6, 2023
Risk UnZipped: Q&A with Priyamvada
Author: Priyamvada Kamra

May 2, 2023
Celebrating our Zipsterversaries: February - April
Author: Multiple contributors

March 21, 2023
Zipping it forward - our partnership with The DV Collective
Author: Anna Wei and Ying Zhang

March 27, 2023
Inspiring our next generation of female sales leaders
Author: Karen Farrar

March 16, 2023
How we celebrated IWD 2023
Author: Multiple contributors

March 16, 2023
Sustainability UnZipped: Q&A with Oli
Author: Oli Nelson

February 16, 2023
Building a fit-for-purpose modern data stack, part 2.
Author: Moss Pauly

February 15, 2023
Risk UnZipped: Q&A with Mohamed
Author: Mohamed Afifi

February 9, 2023
All Things Product: Mel Hambarsoomian
Author: Mel Hambarsoomian

January 29, 2023
Celebrating our Zipsterversaries: November - January
Author: Multiple contributors

January 26, 2023
How our Zipsters celebrated Lunar New Year
Author: Multiple contributors

January 9, 2023
Making use of our Volunteering Leave
Author: Multiple contributors

December 12, 2022
Engineering UnZipped: Q&A with Philip
Author: Philip Laureano

November 15, 2022
Engineering Unzipped: Q&A with Autumn
Author: Autumn Ragland

November 11, 2022
Atomic change and the future of Technology at Zip
Author: George Gorman

November 11, 2022
Moving at Zip speed while building data that stands the test of time
Author: Tal Bergman

November 11, 2022
Engineering UnZipped: Q&A with Kalpana
Author: Kalpana Chandrasekar

November 11, 2022
Master these three skills to grow your data and analytics career
Author: Will Walker