Future-Proofing Tech: How These Engineering Leaders Develop Scalable Software

Three San Francisco engineers proactively build solutions today that meet the demands of tomorrow.

Written by Lucas Dean
Published on Oct. 20, 2023
Brand Studio Logo

Scalability isn’t just a buzzword — it’s a crucial survival factor that holds sway over tech companies’ long-term viability.  

For both startups experiencing astronomical growth and established organizations encountering a surge of new users, systems that aren’t built with sustainability could buckle under the pressure of swelling demand. 

When an architect designs a structure in an area prone to earthquakes, they ensure the building has the necessary flexibility and foundation to withstand future seismic activity. Likewise, software engineers must create elastic, anticipatory infrastructure to handle unexpected — and expected — traffic without collapsing. 

Scalability is, after all, about building a solid foundation that is not only made for present demands but also future expectations. By considering evolving user behaviors, new technologies and increasing data loads, engineers can ensure continuity in service and optimal experiences.  

Engineers at Cruise, Block and Roblox explained why scalability matters for their respective technologies and detailed what guides their efforts to build reliable software.    
 

Image of Deena Donia
Deena Donia
Senior Director of Software Engineering, Cloud Infrastructure & SRE • Cruise

Cruise builds all-electric, self-driving vehicles that help passengers get to where they need to go. 

 

Why is scalability important for the technology you’re building?

I lead Cruise engineering’s cloud infrastructure and SRE teams. When I think about scalability, I think of how it applies to the systems and tools Cruise relies on to operate as a successful business and the people and teams who build and upkeep those systems.

My team’s charter is to ensure we have a highly available, fully functional set of services for the over 2000 engineers across AI, security, infrastructure, robotics and product engineering at Cruise. We must design and build systems that stay well ahead of the load on our systems from several inputs: the number of end-customers, AVs in the fleet, Cruise software engineers, cities we operate in and more. Cruise’s projected growth demands that we build systems that effortlessly and seamlessly scale up and out.

 

Cruise’s projected growth demands that we build systems that effortlessly and seamlessly scale up and out.”

 

When I think about scalability from a team perspective, our engineering team’s work simply cannot scale linearly with Cruise’s business growth. We must build infrastructure and tooling that allows teams to do their job efficiently and quickly as we grow.  

 

How do you build this tech with scalability in mind?

We regularly review and update the Cruise long-term plan to understand our projected scaling curve and then get ground truth on how our current and future cloud services do or do not support that scale. Our daily work focuses on closing the scalability and reliability gaps. Further, we have created system design principles centered around resiliency and reliable scalability. These principles include designing systems for modularity, testability, observability and automated failover, such that Cruise can scale up and out to meet our business goals.

 

What tools or technologies does your team use to support scalability?

My team designs and builds a host of “infrastructure as a service” tools to ensure we’re making it easier for every engineer at Cruise to develop and operate their service in the most reliable, efficient and scalable way possible. 

One of the tools we develop and use daily is release automation, which ensures efficient, predictable and automated cloud service rollouts and rollbacks. Another tool is global persistent storage systems that provide performance coupled with high availability. Multi-region cloud runtime architecture and systems operate cloud services in multiple geographies with load-balancing and failover capabilities.

Autoscaling, load balancing and load shedding help automatically scale compute and storage resources on-demand and balance across available resource pools. Runtime-distributed compute systems like Juno, Statig, Istio and Service Mesh are the compute environments for all of Cruise. Service owners can use observability monitoring and tools to know exactly what is happening in cloud-based services. Service-level objective monitoring tools ensure we have an accurate signal of cloud-based services.

 

 

Image of Alejandro Salinas
Alejandro Salinas
Head of Connectivity and Physical Infrastructure • Block

Block — composed of Square, Cash App, Spiral, TIDAL, and TBD — is a global fintech company aiming to improve economic and blockchain technologies access. 

 

Why is scalability important for the technology you’re building?

Scalability means that we can meet not only the current demands of the business but also future or fluctuating demand in an elastic, cost-efficient way. We provide a platform for our merchants and customers who rely on us to conduct their business and financial activities, so scalability is key for us to support them reliably.

 

How do you build this tech with scalability in mind?

Scalability has both technical and people dimensions. On the technical dimension, at Block, we build our systems by embracing principles that allow them to sustain increased demand or bursts, such as load distribution and compartmentalization. Microservices, availability zones, separate control plans and asynchronous processes are some examples of these principles. We also augment our capacity through third parties and auto-scaling setups.

On the people dimension, our teams own and operate our tech, so they also must scale their capabilities and reach to match our business needs. Automation is key to using our engineering talent where it really matters. Automation will also increase reliability and force you to standardize, which will, in turn, help you to scale better. Last but not least, leveraging third parties can be seen as a kind of staff augmentation, reducing the operational burden on our team and allowing them to focus on the next most important thing.

 

Automation will also increase reliability and force you to standardize, which will help you scale better.”

 

What tools or technologies does your team use to support scalability?

We are leveraging cloud providers’ product offerings to build new and replace existing in-house systems. We also leverage CDN providers to expand our global edge and increase our capacity, protection and overall user experience. Using different providers requires integration work, and for the things we build internally, we try to use as much open source and standard tooling as we can, as that allows us to scale in terms of finding knowledge in the market and speed up staff augmentation and hiring.

 

 

Image of Ella Li
Ella Li
Director, Technical Program Management • Roblox

Roblox is a gaming platform and development system that connects over 65 million users around the world to immersive 3D experiences. 

 

Why is scalability important for the technology you’re building?

Since joining Roblox several years ago, I’ve gotten the chance to tackle various scale challenges across our engine and infrastructure groups. It’s widely known that scalability generally means a system’s capability to go big, such as taking on more users or handling larger amounts of workload. But to me, scale at Roblox also means addressing the little things that improve the user experience for everyone on the platform as we grow together. 

Some examples of the work we are doing to achieve this include making our experiences much faster to load to reduce wait time and making our experiences available across platforms to help people enjoy them at any time on the devices they choose. Ultimately, our focus is on building the platform to enable the creation of any experience we can imagine.

 

How do you build this tech with scalability in mind?

We start with principles. While there are a lot of technologies out there, such as modularization, LOD, caching and so on, scale is usually a hard problem as there are many factors to consider on a wide range of devices. Roblox prioritizes our community above everything else. We always keep them in mind as we decide on our architecture design and how we execute it daily. 

Roblox also pushes for long-term thinking. What does the solution look like when we go five times, ten times, 100 times and so on? We ask the hard questions to each other and iterate our design and solutions, which are guided by the principles. In addition, solving those challenges drives innovation, such as figuring out how to improve performance and efficiency simultaneously so that our creators build better experiences and earn more in the process. 

 

Roblox prioritizes our community above everything else. We always keep them in mind as we make decisions on architecture design and how we execute.”

 

What tools or technologies does your team use to support scalability?

We have many. To name a few, we stream 3D content directly to our clients, enabling large, fast-to-join, more performant and stable experiences. We integrate fluid force systems into our engine to enable creators to add realistic aerodynamic forces to all parts in an experience at scale. We also build large worlds to test our latest technologies. 

 

 

Responses have been edited for length and clarity. Images provided by Shutterstock and listed companies.