Stack Savvy: How Instacart, Wish and Aalto Harness the Power of Data

Used effectively, data has the power to help tech companies achieve greatness. Learn more about how these three Bay Area tech leaders tackle data with smart tech stacks.

Written by Jenny Lyons-Cunha
Published on Aug. 23, 2022
Brand Studio Logo

Chunky plastic sunglasses, hanging macrame planters, chilled pints of oat milk, a summer squash with the perfect saffron hue. All of these disparate treasures have something in common: They can be delivered directly to your doorstep in a matter of hours. 

Apps like Instacart and Wish launch items from refrigerated shopping aisles to users’ front doors in an impressively short timeframe. Meanwhile, services like Aalto can even help buy or sell the doorstep itself. These modern magic tricks are powered by massive amounts of data, which tech leaders Instacart, Wish and Aalto leverage to provide a comprehensive customer experience.  

“One unique aspect is the breadth of data we need to utilize to run our platform,” Instacart Senior Manager of Data Science Nick Gordenier told Built In San Francisco. “We have e-commerce data to optimize our site, user events and auction data for our advertising product, as well as offline brick-and-mortar stock and on-hand data to make sure we only show products that we can deliver.”  

At Wish, Staff Data Scientist Max Li revels in the human insight his team harvests on a daily basis. “What I love about data is that it provides insights into customer behavior,” Li said. “With this data, we are able to train machine-learning models to understand millions of product listings and improve customer and merchant experiences.” 

Nathan Mayer, business operations lead at Aalto, sees data as an advantage for both the company and its customers. “In addition to the data our platform generates, we leverage multiple large public data sets.” Mayer said. “This allows us to not only extract insights that will help drive our business, but to surface the right insights to users so they can make the best decisions throughout the process of selling or buying their home.”  

Whether “munging massive data sets or analyzing data with scientific rigor,” as Li describes it, “data gurus” must employ the perfect tech stack for their company’s needs. Built In San Francisco connected with data experts from Instacart, Wish and Aalto to learn more about the technology they use to create data magic.  

 

Instacart team members chatting in the office
Instacart

 

Image of Nick Gordenier
Nick Gordenier
Senior Manager, Data Science • Instacart

 

Instacart offers shoppers same-day delivery and pickup services to provide fresh groceries and everyday essentials to busy people and families across the U.S. and Canada. Instacart’s massive operation is driven by data collected from numerous sources. Integral to the company’s successful use of data is the savviness of its team. “There is an extremely high degree of data literacy here,” said Senior Manager of Data Science Nick Gordenier. 

 

Describe your data stack.

We use a variety of data tools for storage and computation, as well as a combination of batch and streaming processes to collect data from numerous sources — including our app, third parties, our retailers and more. These land in S3 or are consumed by additional real-time systems. 

We then leverage a combination of Snowflake and Databricks for computation and analysis. On top of these warehouses, we use Airflow, DBT and in-house tools for orchestration and data-modeling. This makes pipelining easy for our data scientists and data engineers. 

This collection of tools allows us to distribute the burden of organizing, generation, transformation and scheduling across data science, engineering and analytics — allowing us more flexibility in our day-to-day projects and usage of data across teams.

 

INSTACART’S STACK SAVVY 

  • S3
  • Snowflake
  • Databricks
  • Airflow
  • DBT
  • Amundsen

 

How does your organization use data?

Data is at the core of decision-making at Instacart. We use it in product and strategy development, business intelligence and modeling in real time, to name a few. 

This breadth of data and use cases comes with challenges. We must choose the right technology to accomplish each task: from warehousing and offline analytics to real-time predictions and accounting.

Data is at the core of decision-making at Instacart.”

 

How has your stack evolved over time?

The next few years will bring a variety of challenges for our data infrastructure. As we continue to scale our data teams — as well as the volume of data we are using — we will need to think through access controls, documentation and discovery tools. 

This year, we kicked off several projects to put our documentation into code so that it updates as our data evolves. This doesn’t solve all of our documentation challenges, but it is a big step forward for scalability.

As we continue to expand the size of our machine learning (ML), directory system (DS) and analytics teams, discovery is a challenge that will continue to grow. We have invested in Amundsen to help solve this problem. We use the tool to document tables and their relationships and lineage — making artifacts searchable and giving folks a starting point rather than having to ask a coworker or channel every time you need to access new data sources for a project.

 

 

Team members in the Wish office
Wish

 

Image of Max Li
Max Li
Staff Data Scientist • Wish

 

Wish is a mobile e-commerce platform that connects consumers with a wide selection of products delivered directly to them. The company strives to provide an entertaining and affordable shopping experience for its users. When it comes to leveraging data to power Wish’s expansive e-commerce platform, Staff Data Scientist Max Li enjoys the variety in his projects. Data may seem dry to some, but Li finds the work rich and exciting. “What is so interesting is that no two projects are alike,” Li said. 

 

Describe your data stack.

Experimentation is in the DNA of our data-driven culture. The organization continues to allow its data scientists the autonomy to drive complex projects with the support of experienced coworkers. 

In my team specifically, we use the following tools: SQL, Python, Spark and Scala. SQL is the primary tool to query our databases, but I do use Spark for heavy data processing. Scala is useful for building custom Spark functions but Python builds models and puts them into production.

 

A DATA WISH COME TRUE: WISH’S STACK

  • SQL
  • Python
  • Spark
  • Scala

 

How does your organization use data?

All employees at Wish use data in their day-to-day work. Every function throughout the organization uses data to make decisions that impact our customers and merchants around the globe — it’s even a pillar of our culture and part of our mission and purpose statements.

All team members at Wish are data gurus. We munge massive data sets, analyze data with scientific rigor, leverage data to address business needs and present arcane data in a way everyone can understand.

Experimentation is in the DNA of our data-driven culture.”

 

How has your stack evolved over time?

Throughout my career at Wish, my stack has evolved to be bigger and more data-oriented. Even crazier, it is not uncommon to have a dataset that cannot fit in the memory of a single computer these days.

Over the last few years, more and more tools have come out to improve scalability and efficiency by utilizing parallel computing and graphic processing units. I believe this trend will continue.

 

 

Image of Nathan Mayer
Nathan Mayer
Business Operations Lead • Aalto

 

Aalto is a real estate tech company that is aiming to build a marketplace where buyers and sellers can connect and accomplish their homeownership goals through a secure, flexible and transparent experience. Business Operations Lead Nathan Mayer spoke to the way Aalto uses data. “Data has a few roles at Aalto, from providing key insights to our users to creating a data-driven culture to automating away manual day-to-day operational tasks by stitching together data sources quickly,” he told Built In San Francisco. 

 

Describe your data stack.

We use Mozart Data and Metabase as the nucleus of our data strategy. We went with Mozart because it’s a managed data pipeline built for users who are SQL-savvy, but may not have a highly technical background. 

Because of this, we didn’t have to hire a data engineering team or stitch together tools from multiple vendors to quickly get meaningful insights from our data. Even better, product engineers have been able to use the same tools to surface key market insights to our buyers and sellers and save time by not building their own data pipelines.

 

HOW AALTO STACKS FOR SUCCESS

  • Mozart Data
  • Metabase

 

How does your organization use data?

What’s unique to us is that, in addition to the data our platform generates, we leverage multiple large public data sets to both provide our end users insights on what’s happening in the market and make business and product decisions. Because of how many ways this data can be used, it becomes important for us to find ways to intelligently leverage rich data. 

We use Mozart Data and Metabase as the nucleus of our data strategy.”

 

How has your stack evolved over time?

On the business side, our big evolution was centralizing all our data through Mozart from Stitch + BigQuery, and frankly, we should be set for a while until we have to start considering more AI-intensive use cases. We’ll be focused more on infusing data across the company by not only making data self serve across the company — and keeping good hygiene so it always is useful, not chaotic — but also making sure data power users across the company are regularly sharing what they’re working so we can easily piggy-back off learnings, ideas and insights.

 

 

Responses have been edited for length and clarity. Images via listed companies and Shutterstock.